Boost.Coroutine

November 7th, 2009

A while back, I wrote about cooperative threading libraries for C++. Coroutines are a closely related concept—coroutines and cooperative threads can be expressed/implemented in terms of each other.

Conspicuously absent from that coverage is Boost.Coroutine, which I’ll discuss here. The problem with Boost.Coroutine is that it was incomplete, and—last I checked—far from complete. I had spent some time trying to work with the author through its non-starter issues, as I was looking forward to using it in conjunction with Boost.Asio (this was one of Boost.Coroutine’s primary objectives), but the author has not had the time to take his work to the Boost formal review stage.

Re-enabling desktop effects in Windows 7

November 7th, 2009

Every once in a while, Windows 7’s Desktop Window Manager (DWM) spontaneously goes funky on me and loses its effects (transparency, blurring, shadows).

When this happens, you can get your effects back by going to Services and restarting the “Desktop Window Manager Session Manager” service, or by going to an Administrative command prompt and running:

net stop uxsms
net start uxsms

C++0x lambdas in GCC 4.5

November 3rd, 2009

Lambdas are coming in GCC 4.5. With this I’ve rid myself of the clamp hack. You can start using lambdas today by just grabbing a GCC snapshot:

svn co -q svn://gcc.gnu.org/svn/gcc/trunk
sudo aptitude build-dep gcc-4.4 # get prereqs
./configure --program-suffix=-4.5 --disable-libgcj \
--enable-languages=c++
make
sudo make install
# test it out
echo 'int main() { []{}; return 0; }' > /tmp/nop.cc
g++-4.5 -std=gnu++0x -o /tmp/nop{,.cc}

Here’s a simple program from my C++ sandbox showing off just a few of C++0x’s slew of new features:

#define _GLIBCXX_USE_NANOSLEEP
#include <functional>
#include <iostream>
#include <string>
#include <thread>
#include <vector>

using namespace std;
int main() {
// strongly typed enums
// can specify underlying type
enum class msg_type : uint8_t { hello, world };
// initializer lists for easy init
// no more `>>` misparsing
const vector<pair<int, string>> xs =
{{1, "hello"}, {3, "world"}};
// standard classes for concurrency
vector<thread> ts;
// type inference with repurposed `auto`
// no need for `vector<string>::const_iterator`
for (auto x = xs.begin(); x != xs.end(); ++x) {
// new function class
// could also use `auto` here
// here's a lambda that captures x by value
function<void()> f = [=]{
// new chrono class
std::this_thread::sleep_for(
chrono::seconds(x->first));
cout << x->second << endl;
};
// thread is moveable
// no worry about copying/destroying here
// enabled by rvalue references (&&)
ts.push_back(thread(f));
}
for (auto t = ts.begin(); t != ts.end(); ++t) {
t->join();
}
return 0;
}

Compile it with:

g++-4.5 -std=gnu++0x -pthread -o c++0x{,.cc}

Other features are coming down the pipe as well. Wikipedia has good high-level overviews of these and more.

Notes wiki

October 18th, 2009

I started using a gitit wiki to maintain my notes on CS research and programming. It’s available here. For a while, I’ve kept my notes in a bunch of “loosely pandoc” files, so gitit was an easy way to wiki-fy everything—to make them easier to view/edit from any browser, and to share publicly.

At the moment my notes are highly disorganized and probably have many formatting bugs. Also, they typically don’t include topics from classes I’ve taken. I’m hoping this will become a not-too-cryptic way for me to share more information with less effort than writing blog posts.

Getting started with Windows kernel development

September 25th, 2009

I’m diving into Windows development here in MSR Redmond’s OS group. I have mainly a Linux background and know much less about Windows, so I’ve been taking notes. I’ll be publishing an assortment of these notes/tips. Today I’ll talk about kernel-mode development and debugging basics.

Windows kernel module/driver binaries are .sys files (like .ko files on Linux), which may be accompanied by a .pdb file containing the debug symbols for the binary. Kernel Programming 101 is a nice introductory tutorial on how to write and build these drivers, but the section on loading them manually by editing the registry doesn’t work in Windows 7 (perhaps this was XP-and-earlier). You should use the DriverLoader tool from OSR Online to register your driver as a service and start it; it probably uses the Windows API to accomplish this. By the way, OSR Online is a useful website and community for Windows kernel hacking that puts out other tools as well.

windbg (“windbag”) is a handy multi-purpose debugger, included in the Windows Driver Kit. It used to be a userland-only debugger, whereas kd was the kernel debugger, but all debugging functionality has been merged into one place, used by both windbg and ntsd. ntsd is a command-line debugger, whereas windbg has a GUI and a bunch of other nifty features, such as the ability to pull symbols (those .pdb files) and sources automatically from symbol servers and source servers. I used to use windbg to analyze those crash dumps that are produced whenever you get a BSOD; the stack trace may provide hints as to the source of the problem.

In the 64-bit versions of Windows Vista and Windows 7, you can normally only load signed drivers. To disable this requirement, start Windows in debug mode. Typically you’ll want to debug a Windows environment running in a VM like Virtual PC. To debug Windows over the virtual COM serial port, the first step is to run the following from an Administrator cmd and reboot:

bcdedit -debug on
bcdedit -dbgsettings serial debugport:1 baudrate:115200

bcdedit is a program that edits boot settings, which once upon a time were configured in a file called boot.ini. There are alternative instructions on the web mentioning other things to try with bcdedit, which I haven’t looked carefully at, but if you’re doing kernel development then you may want debugging mode turned on anyway.

The next step is to go to your VM settings and configure the COM1 port to map to a named pipe, \\.\pipe\<pipe name>. Now you can start the debugger:

set _NT_SOURCE_PATH=SRV*;
set _NT_SYMBOL_PATH=srv*c:\syms*\\symbols\symbols
windbg -k com:port=\\<vpc_host_machine>\pipe\<pipe name>,pipe,resets=0,reconnect

The environment variables tell windbg about the servers from which to automatically pull symbols/sources; you’ll need to adjust this, since \\symbols\symbols is a Microsoft-internal share. resets=0 is required for Virtual PC; use resets=2 for a competing VM product. reconnect waits for a named pipe if it’s not found on the target and waits to reconnect if disconnected. vpc_host_machine is the name of the host running the VM; if you’re debugging locally, use \\.\pipe\<pipe name>.

Side note: symbol files contain information about variable and function names. Public symbols include globals, while private symbols include everything else (locals, structs, etc.). Full symbol files contain both, whereas stripped symbol files contain just public symbols. Microsoft’s Internet symbol server is at http://msdl.microsoft.com/download/symbols. To use it, you can just leave the symbol server variable alone; I believe the default behavior is to use this public server. (I also believe the .symfix windbg command without arguments restores the default symbol path.) In my case, to also allow symbols for my own driver to be resolved, I needed to set the symbol path to the following:

srv*c:\syms*\\symbols\symbols;C:\Users\...\mydriver\objchk_win7_x86

To break down what just happened: the symbol path syntax consists of multiple semi-colon-separated paths, each of which can be a local path, a cache, or a symbol server. Here are some example components:

cache*C:\syms

uses C:\syms as a cache for all components to its right.

srv*http://symserver/symbols

uses http://symserver/symbols as a symbol server. This can also be a share, i.e. \\symserver\symbols.

srv*C:\syms*http://symserver/symbols
cache*C:\syms;srv*http://symserver/symbols

are (I think) equivalent, and use C:\syms as a local cache for symbols from the remote symbol server. You can specify any number of cascading intermediate caches:

srv*C:\syms*\\symcache\symbols*http://symserver/symbols

When debugging issues with symbols, use !sym noisy to enable verbose symbol debugging info.

The source path is used differently. I’m not sure how source lookup works for built-in Windows modules, but when debugging your own drivers, the .sys and .dll files themselves contain the full local paths to the source files, so that you can list and step through your code out-of-the-box if you’re running windbg from the same machine where you built.

Anyway, back to actual debugging: you can attach to the system by restarting the VM. Note that windbg can only attach when the system starts up; you can’t attach to a running system. Once you’re in the debugger, you can hit ctrl-break to immediately break the target machine. You can try setting a breakpoint with bp, e.g., bp nt!NtReadFile. The word before the ! is the name of the module/driver (in this case we’re addressing the kernel), and the word after it is the name of the subroutine to break in. Your symbol path must be set up correctly to resolve these function names to addresses. You can also set breakpoints in your own driver, such as with bp mydriver!DriverEntry. It’s OK to set this before the driver is even loaded; bp will automatically behave like bu, which creates an unresolved breakpoint, to be resolved once the matching module is loaded. Note that if you update your sources while they’re open in windbg, you must close the file to refresh it.

To continue, enter g. Play around with the debugger, and consult the documentation - the WDK is well-documented.

IE and Chrome

September 17th, 2009

IE7 (2006) introduced Protected Mode, which runs browser tabs in separate, sandboxed processes to isolate tabs from each other and from the OS, communicating with central Broker processes that remain privileged and provide system services. Chrome’s architecture was designed similarly.

IE7’s Protected Mode uses Vista’s Mandatory Integrity Control, which prevents processes from modifying any files, allowing them to write only to locations marked as low, such as Temporary Internet Files. Files touched by processes are marked with their integrity level. Integrity Levels correspond to Internet Zones. Chrome on Vista also leverages MIC.

IE8 introduced InPrivate, among a bevy of other features. InPrivate is similar to Incognito browsing in Chrome.

Application-based packet filtering on Linux

September 14th, 2009

iptables can’t filter on process ID or any other “direct” application identifier, which means you can’t say things like, “Allow only Firefox to send/receive any packets.” However, it can filter on user/group ID, allowing you to do user-based packet filtering, so that you can at least restrict applications if you run them as a certain uid/gid. The owner module (xt_owner) matches the owner of the socket (man iptables for more details).

# iptables -m owner --help
iptables v1.4.4
[...]
owner match options:
[!] --uid-owner userid[-userid]      Match local UID
[!] --gid-owner groupid[-groupid]    Match local GID
[!] --socket-exists                  Match if socket exists

Of course, this all applies only to local sockets; if this system is serving as a router for other hosts, then you don’t have the uid/gid information for their sockets (if their OS even has those notions).

HOWTO: Crack Yahoo’s Intranet

August 21st, 2009

Ben Reed gave me a run-down of how he managed to win Yahoo’s internal Crack Day by getting into the corporate network and making commits to their code repositories, all using off-the-shelf tools and without writing a line of code. These flaws have been fixed by now.

The first step is to associate with the wireless network. The network is secured using WEP, which is straightforward to crack using WEPCrack. This can be done from anywhere near the campus premises, and it took Ben at most 30 minutes.

Yahoo uses Cisco VPN and Aruba VPN. Cisco VPN turns out to be IPsec with extensions, and Yahoo’s is configured to use pre-shared keys (as opposed to certificates/PKI). It’s also configured to use aggressive mode for a faster three-packet initial key exchange, as opposed to main mode, which uses six packets but is more secure. From the excellent NIST Guide to IPsec VPNs:

…unlike main mode, aggressive mode can be used with pre-shared secret key authentication for hosts without fixed IP addresses. However, with the increased speed of aggressive mode comes decreased security. Since the Diffie-Hellman key exchange begins in the first packet, the two parties do not have an opportunity to negotiate the Diffie-Hellman parameters. Also, the identity information is not always hidden in aggressive mode, so an observer could determine which parties were performing the negotiation. (Aggressive mode can conceal identity information in some cases when public keys have already been exchanged.) Aggressive mode negotiations are also susceptible to pre-shared key cracking, which can allow user impersonation and man-in-the-middle attacks. Another potential issue is that while all IPsec devices must support main mode, aggressive mode support is optional. Unless there are performance issues, it is generally recommended to use main mode for the phase one exchange.

The flaws were pointed out back in 1999:

Aggressive mode does not usually provide identity protection, as this option is not required to be implemented. The identities can be exchanged in the clear, before a common shared-secret has been established. This is considered a feature for mobile users. Yet it is mobile users who are most likely to be affected by eavesdropping on wireless links. Such revealed identities are long-term liabilities. Compromised identities continue to be useful to an adversary until all participants have revoked the associated permissions. Identity attacks are extremely easy and may be mounted from anywhere on the Internet. Moreover, the revealed identities might be encrypted in other exchanges. This provides a ripe opportunity for cryptanalysis of those exchanges. This fundamental design flaw is inherent in the specification, and remediation will require removal of the aggressive mode feature.

Windows laptops are required to use the Aruba VPN client, of which we know nothing. Only Macs use the Cisco VPN client, so we need to find some Macs; this can be done with passive OS fingerprinting tools like p0f by Michal Zalewski.

Now we can use ARP spoofing to fool the Macs into thinking we’re the gateway and sending all their packets through us. Ben used ARPoison. Make sure you’ve activated IP forwarding on your system so that you actually route packets.

Once we’re the man in the middle, we can use FakeIKEd to nab the credentials:

FakeIKEd, or fiked for short, is a fake IKE daemon supporting just enough of the standards and Cisco extensions to attack commonly found insecure Cisco PSK+XAUTH VPN setups in what could be described as a semi MitM attack. Fiked can impersonate a VPN gateway’s IKE responder in order to capture XAUTH login credentials; it doesn’t currently do the client part of full MitM.

One problem is that you need to tell fiked the shared key. You can get this by grabbing a copy of Yahoo’s VPN client from Yahoo Frontyard, which is Yahoo’s external-facing site for employees. However, the site requires a valid login and is served over HTTPS. The trick is that it installs an authenticator cookie, which is also sent along with non-HTTPS requests to other yahoo.com sites.

Note that the VPN system has since been reconfigured, and now uses certificates/PKI to avoid MITM attacks. Furthermore, the VPN authentication has been augmented with RSA SecurID, which provides a rolling token for two-factor authentication. This complicates the attack, though SecurID is still vulnerable to MITM attacks executed within the appropriate timeframe.

Once you’re in the corporate network, you can mount user home directories which are exported over NFS. Permissions are not enforced over NFS (which is designed to be used by a trusted set of hosts, but apparently the NFS servers here don’t use any such list of hosts), so you can assume any user ID and touch anyone’s files—including their SSH authorized_keys file. Simply drop your public key in ~filo/.ssh/authorized_keys, and you can now log in as David Filo and (among other things) commit code.

Yahoo has since disabled the ability for sshd to use users’ authorized_keys files and instead has a separate mechanism for adding public keys.

Windows 7 non-impressions

July 21st, 2009

I’ve been using the free beta release of Windows 7 on my Lenovo X41 Tablet, and during this time I’ve found that I really don’t need that many local applications installed. It’s nice and simple, and everything runs smoothly and quickly (most notably, suspend/hibernate). No IBM/Lenovo software accessories or manually installed device drivers to set up or get in the way. I think the only thing which I could imagine missing is the IBM accelerometer-based hard disk protector.

I installed the system by blindly following these instructions to create a bootable USB from the DVD ISO under Windows XP (most instructions assume Windows Vista or Windows 7). The only hitch during setup was that the system doesn’t have out-of-box support for my Intel PRO/Wireless 2200BG network adapter; I needed to run Windows Update over an Ethernet cable first. Another annoyance, but unrelated to Windows 7, was that I had to defragment my (NTFS-formatted) partition some ten times over before I could ntfsresize it (with an Ubuntu live USB) to make space for Windows 7.

Here’s the shortlist of applications that I actually installed:

  • Chrome: “mostly-FLOSS” browser of choice
  • Flash: not FLOSS, but your alternative is to live under a rock
  • PuTTY: standard SSH client
  • Sumatra PDF: a PDF viewer that’s much lighter weight than Acrobat

And that’s it! No Java. No IM or email client—these run on my main desktop, but even if they didn’t, I’d just use Gmail/Meebo/etc. No IDE or dev tools—I do most of my development over SSH. No office suite—I just don’t use these applications much anymore, and probably also because I’ve moved most of my document authoring to LaTeX or Pandoc or what have you. I imagine I’ll be crawling back to Powerpoint 2007 once I have to crank out another presentation, though—it’s just easy, fast, and pretty.

I didn’t even need to download the Consolas font, since it’s included in Windows 7.

Over time, I can imagine myself also installing the following:

  • Filezilla: use it mainly for SFTP
  • TightVNC: so I can connect to my desktop for apps like Thunderbird
  • VideoLAN Client: because it somehow manages to play any video I throw at it, but until I actually need to watch a movie on my laptop, I usually find myself watching Hulu and Youtube
  • Vuze: downloading media wirelessly may be safer, and Vuze is still the best BT client with features that matter (faster downloads, resilient tracking, search); goes well with PeerGuardian

I thought I’d have more to say about Windows 7 by this point, but I honestly still haven’t experienced it that deeply. (The fact that it’s out of my way is just as well a Good Thing, but where’s the fun in that?) That, plus the fact that I don’t have many local apps, lets me realize first-hand that the significance of the laptop/netbook OS is becoming marginalized, and that Chrome OS/Android/Linux really could become relevant. The paradox is that native mobile device apps—like those written for the iPhone OS—are popular and preferred over their web counterparts, and I think that has more to do with generally less-connected usage of these devices and the not-quite-as-slick UI of most web apps.

I elected to try Windows 7 primarily for fun, but also to keep at least some part of myself familiar with the dominant desktop OS of the world, and lastly because it just has better tablet support, battery life, and suspend/hibernate compared to my prior experiences with Linux on the laptop.

GRE e-rater

June 2nd, 2009

It turns out that the ETS has a whole research division, including researchers in natural language processing who come up with stuff like e-rater and other machine essay graders, and that they publish about these systems.

According to the latest system description paper:

The feature set used with e-rater V.2 include measures of grammar, usage, mechanics, style, organization, development, lexical complexity, and prompt-specific vocabulary usage.

E-rater is part of Criterion, a web-based service that provides students with instant scoring and feedback on their submitted essays. Criterion has a number of writing analysis tools whose output form the feature vector used by e-rater. The score is a simple weighted average of the feature values.

One noteworthy detail is that in determining the parameters to use for this model, e-rater ecshews exclusively statistical machine learning (optimization) approaches in favor of allowing judgmental control, for reasons of control (to avoid unintentional skew and other undesirable statistical effects) and transparency (to make the system easier to understand and explain).

It would be interesting to see how straightforward it is to game e-rater, given the above information and access to the implementation in Criterion.